mask generator
SILENCE: Lightweight Protection for Privacy in Offloaded Speech Understanding
Speech serves as a ubiquitous input interface for embedded mobile devices. Cloud-based solutions, while offering powerful speech understanding services, raise significant concerns regarding user privacy. To address this, disentanglement-based encoders have been proposed to remove sensitive information from speech signals without compromising the speech understanding functionality. However, these encoders demand high memory usage and computation complexity, making them impractical for resource-constrained wimpy devices. Our solution is based on a key observation that speech understanding hinges on long-term dependency knowledge of the entire utterance, in contrast to privacy-sensitive elements that are short-term dependent. Exploiting this observation, we propose SILENCE, a lightweight system that selectively obscuring short-term details, without damaging the long-term dependent speech understanding performance. The crucial part of SILENCE is a differential mask generator derived from interpretable learning to automatically configure the masking process. We have implemented SILENCE on the STM32H7 microcontroller and evaluate its efficacy under different attacking scenarios. Our results demonstrate that SILENCE offers speech understanding performance and privacy protection capacity comparable to existing encoders, while achieving up to 53.3 speedup and 134.1 reduction in memory footprint.
- Asia > China > Guangdong Province > Shenzhen (0.05)
- North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Poland (0.04)
Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation
Promptable segmentation typically requires instance-specific manual prompts to guide the segmentation of each desired object. To minimize such a need, task-generic promptable segmentation has been introduced, which employs a single task-generic prompt to segment various images of different objects in the same task. Current methods use Multimodal Large Language Models (MLLMs) to reason detailed instance-specific prompts from a task-generic prompt for improving segmentation accuracy. The effectiveness of this segmentation heavily depends on the precision of these derived prompts. However, MLLMs often suffer hallucinations during reasoning, resulting in inaccurate prompting. While existing methods focus on eliminating hallucinations to improve a model, we argue that MLLM hallucinations can reveal valuable contextual insights when leveraged correctly, as they represent pre-trained large-scale knowledge beyond individual images.
Adversarial Scene Editing: Automatic Object Removal from Weak Supervision
While great progress has been made recently in automatic image manipulation, it has been limited to object centric images like faces or structured scene datasets. In this work, we take a step towards general scene-level image editing by developing an automatic interaction-free object removal model. Our model learns to find and remove objects from general scene images using image-level labels and unpaired data in a generative adversarial network (GAN) framework. We achieve this with two key contributions: a two-stage editor architecture consisting of a mask generator and image in-painter that co-operate to remove objects, and a novel GAN based prior for the mask generator that allows us to flexibly incorporate knowledge about object shapes. We experimentally show on two datasets that our method effectively removes a wide variety of objects using weak supervision only.
Adversarial Scene Editing: Automatic Object Removal from Weak Supervision
While great progress has been made recently in automatic image manipulation, it has been limited to object centric images like faces or structured scene datasets. In this work, we take a step towards general scene-level image editing by developing an automatic interaction-free object removal model. Our model learns to find and remove objects from general scene images using image-level labels and unpaired data in a generative adversarial network (GAN) framework. We achieve this with two key contributions: a two-stage editor architecture consisting of a mask generator and image in-painter that co-operate to remove objects, and a novel GAN based prior for the mask generator that allows us to flexibly incorporate knowledge about object shapes. We experimentally show on two datasets that our method effectively removes a wide variety of objects using weak supervision only.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Germany > Saarland > Saarbrücken (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Germany > Saarland > Saarbrücken (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- North America > Canada > Quebec > Montreal (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Asia > China > Beijing > Beijing (0.05)
Supplementary for Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity Y an Liu
In this supplementary material, we will provide more analyses of mask prior in Section 1 and similarity transfer in Section 2. We will show the visualization results in Section 3 and the performance variance Figure 2) to better investigate the effectiveness of mask prior. We treat the similarity prediction task as a binary classification task, in which the binary label 1 ( resp. For VOC test set, we repeat the above procedure. The precision, recall and F1 scores are summarized in Table 1. We observe that the gap between the performance of similarity network on base categories and novel categories is negligible ( e.g., F1 Scores " person " and calculate their pairwise semantic similarity scores by applying the trained similarity It is well worth noting that the average similarity score will be affected slightly by the number of outliers if batch size increases to a large scale ( e.g., 64).